7 research outputs found

    Automatically Building a Corpus for Sentiment Analysis on Indonesian Tweets

    Get PDF

    Detecting Controversial Articles on Citizen Journalism

    Get PDF
    Someone's understanding and stance on a particular controversial topic can be influenced by daily news or articles he consume everyday. Unfortunately, readers usually do not realize that they are reading controversial articles. In this paper, we address the problem of automatically detecting controversial article from citizen journalism media. To solve the problem, we employ a supervised machine learning approach with several hand-crafted features that exploits linguistic information, meta-data of an article, structural information in the commentary section, and sentiment expressed inside the body of an article. The experimental results shows that our proposed method manages to perform the addressed task effectively. The best performance so far is achieved when we use all proposed feature with Logistic Regression as our model (82.89\% in terms of accuracy). Moreover, we found that information from commentary section (structural features) contributes most to the classification task

    Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra

    Full text link
    The abundant biodiversity of coral reefs in Indonesian waters is a valuable asset that needs to be preserved. Rapid climate change and uncontrolled human activities have led to the degradation of coral reef ecosystems, including coral bleaching, which is a critical indicator of coral health conditions. Therefore, this research aims to develop an accurate classification model to distinguish between healthy corals and corals experiencing bleaching. This study utilizes a specialized dataset consisting of 923 images collected from Flickr using the Flickr API. The dataset comprises two distinct classes: healthy corals (438 images) and bleached corals (485 images). These images have been resized to a maximum of 300 pixels in width or height, whichever is larger, to maintain consistent sizes across the dataset. The method employed in this research involves the use of machine learning models, particularly convolutional neural networks (CNN), to recognize and differentiate visual patterns associated with healthy and bleached corals. In this context, the dataset can be used to train and test various classification models to achieve optimal results. By leveraging the ResNet model, it was found that a from-scratch ResNet model can outperform pretrained models in terms of precision and accuracy. The success in developing accurate classification models will greatly benefit researchers and marine biologists in gaining a better understanding of coral reef health. These models can also be employed to monitor changes in the coral reef environment, thereby making a significant contribution to conservation and ecosystem restoration efforts that have far-reaching impacts on life.Comment: in Indonesian languag

    Modelling search and session effectiveness

    Get PDF
    © 2020 Alfan Farizki WicaksonoSearch effectiveness metrics are used to quantify the quality of a ranked list of search results relative to a query. One line of argument suggests that incorporating user behaviour into the measurement of search effectiveness via a user model is useful, so that the metric scores reflect what the user has experienced during the search process. A wide range of metrics has been proposed, and many of these metrics correspond to user models. In reality users often reformulate their queries during the course of the session. Hence, it is desirable to involve both query- and session-level behaviours in the development of model-based metrics. In this thesis, we use interaction data from commercial search engines and laboratory-based user studies to model query- and session-level search behaviours, and user satisfaction; to inform the method for evaluation of search sessions; and to explore the interaction between user models, metric scores, and satisfaction. We consider two goals in session evaluation. The first goal is to develop an effectiveness model for session evaluation; and the second goal is to establish a fitted relationship between individual query scores and session-level satisfaction ratings. To achieve the first goal, we investigate factors that affect query- and session-level behaviours, and develop a new session-based user model that provides a closer fit to the observed behaviour than do previous models. This model is then used to devise a new session-based metric, sINST. In regard to the second goal, we explore variables influencing session-level satisfaction, and suggest that the combination of both query positional and quality factors provides a better correlation with session satisfaction than those based on query position alone. Based on this observation, we propose a novel query-to-session aggregation function, that is useful for scoring sessions when sequences of query reformulations are observed. We also propose a meta-evaluation framework that allows metric comparisons based on empirical evidence derived from search interaction logs, and investigate the connection between predicted behaviour and observed behaviour, and between metric scores and user satisfaction at both query and session-levels

    Detection of Negative Content (Hoax) On Microblog Data That Contains Covid-19 Information

    No full text
    Over the past few years, the amount of information dissemination has increased, especially since the advent of social media. Among the information circulating, there is information that includes negative content or hoax that have a bad impact such as the emergence of divisions due to incorrect information. Based on the 2018 Kominfo performance report, Twitter social media is the largest contributor to the spread of hoax. To reduce the impact of the spread of hoax, a method is needed to detect hoaxes on Twitter so that prevention can be done such as taking down tweets that are hoax. The purpose of this research is to develop a model that can detect negative content (hoax) automatically and also see the correlation between hoax content and sentiment orientation. The results of this study are a machine learning-based model using a decision tree algorithm with an accuracy of 97.2% with a precision value of 85.4, recall of 81.4, and f1-score 93 and the model. In addition, the results of the analysis show that tweets that are hoax as a result of model identification are dominated by positive sentiment orientation, which is 52.64% of the total data identified as hoax &nbsp

    Automatically Building a Corpus for Sentiment Analysis on Indonesian Tweets

    No full text
    corecore